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Viruses in the families Arteriviridae and Coronaviridae have enveloped virions which contain nonseg- 
mented, positive-stranded RNA, but the constituent genera differ markedly in genetic complexity and 
virion structure. Nevertheless, there are striking resemblances among the viruses in the organization 
and expression of their genomes, and sequence conservation among the polymerase polyproteins 
strongly suggests that they have a common ancesiry. On this basis, the International Committee on 
Taxonomy of Viruses recently established a new order, Nidovirales, to contain the two families. Here, 
the common traits and distinguishing features of the Nidovirales are reviewed. © 1997 Academic Press 
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INTRODUCTION 


The Nidovirales (summarized in Table 1) is a newly 
established order comprising the families Arteriviri- 
dae (genus Arterivirus) and Coronaviridae (genera 
Coronavirus and Torovirus). Species in the genus Corona- 
virus can be grouped into three clusters on the basis of 
serological and genetic properties (1). Two torovirus 
species have been recognized: the equine and bovine 
toroviruses (ETV, Berne virus; and BoTV, Breda virus). 
In addition, a human torovirus is thought to exist (2) 
and we have recently identified a porcine torovirus 
(PoTV) (Kroneman et al., unpublished). The genus 
Arterivirus presently contains four species. 

Despite considerable differences in genetic complex- 
ity and virion architecture, coronaviruses, toroviruses, 
and arteriviruses are strikingly similar in genome 
organization and replication strategy (3) (Fig. 1). The 
name Nidovirales (from the Latin nidus, nest) refers to 
the 3’ coterminal nested set of subgenomic (sg) viral 
mRNAs that is produced during infection. Sequence 
similarities, although mostly restricted to the lb poly- 
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protein (POL1b) from which the replicase-associated 
proteins are derived, suggest that the Nidovirales have 
evolved from a common ancestor. Apparently their 
divergence has been accompanied by extensive ge- 
nome rearrangements through heterologous RNA re- 
combination. 

Here, we review the common traits and distinguish- 
ing features of the genome organization, gene expres- 
sion, and evolution of the Nidovirales. Other reviews 
are references 3 to 9 and the different models proposed 
for sg mRNA synthesis are discussed in references 8 
to 10. 


VIRION ARCHITECTURE 
AND STRUCTURAL PROTEINS 


The phylogenetic relationship among arteriviruses, 
toroviruses, and coronaviruses is not apparent from 
their morphology. Coronavirions are roughly spheri- 
cal, 100-120 nm in diameter, with a fringe of c. 
20-nm-long petal-shaped spikes. Some group II corona- 
viruses exhibit a second fringe of smaller surface 
projections about 5 nm in length. Torovirus particles 
are pleiomorphic, measuring 120 to 140 nm in their 
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TABLE 1 
Order: Nidovirales 
Family: Arteriviridae Coronaviridae 
Genus: = Arterivirus Torovirus Coronavirus 
Species: Equine arteritis virus EAV Equine torovirus ETV  Transmissible gastroenteritis virus = TGEV 
Porcine reproductive and respiratory Bovine torovirus BoTV_ Feline coronavirus FCoV 
syndrome virus PRRSV_ Porcine torovirus PoTV Canine coronavirus CCV I 
Lactate dehydrogenase-elevating Human coronavirus HCV 229E 
virus LDV Porcine epidemic diarrhea virus PEDV 
Simian hemorrhagic fever virus SHFV 
Mouse hepatitis virus MHV 
Bovine coronavirus BCV 
Human coronavirus HCV OC43 
Porcine hemagglutinating encepha- II 
lomyelitis virus HEV 
Sialoacryoadenitis virus SADV 
Turkey coronavirus TCV 
Infectious bronchitis virus IBV Ul 


largest axis; spherical, oval, elongated, and kidney- 
shaped virions have been described. The surface projec- 
tions on torovirus virions closely resemble coronavirus 
peplomers (11). Arterivirions are only 50-70 nm in 
diameter and lack large surface projections. Instead, 
cup-like structures with a diameter of 10 to 15 nm have 
been observed (12). The difference in virion architec- 
ture become even more apparent when comparing the 
nucleocapsid structures. That of coronaviruses is a 
loosely wound helix (13), that of toroviruses is a 
compact tubular structure (11), and that of arterivi- 
ruses is isometric, about 25-35 nm in diameter, and 
possibly icosahedral (12). The nucleocapsid proteins 
(N) differ considerably in size (c. 50, 19, and 14 kDa for 
corona-, toro-, and arteriviruses, respectively) and 
amino acid sequence. 

The compositions of the viral envelopes also differ. 
Coronavirus membranes contain: (i) 180- to 220-kDa 
spike protein (S), (ii) 25- 30-kDa triple-spanning mem- 
brane protein M, and (iii) c. 10-kDa transmembrane 
protein E, a minor virion component but essential for 
virus assembly (14,15). The small surface projections of 
group II coronaviruses are dimers of a 65-kDa class | 
membrane protein, the hemagglutinin-esterase (HE), 
possibly acquired by heterologous RNA recombina- 
tion (16,17). 

Toroviruses also specify M and S proteins of 26 and 
180 kDa, respectively. Although different in sequence, 
the M and S proteins of toro- and coronaviruses are 
alike in size, structure, and function. The M proteins 
have a similar triple-spanning membrane topology 


(18), and the heptad repeats, indicative of a coiled-coil 
structure in the spike proteins of coronaviruses (19), 
are also present in the torovirus peplomer (20). Thus, 
the S and M genes of these viruses may well be phylo- 
genetically related (6,18,20). Puzzlingly, toroviruses 
seem to lack a homologue for the E protein, which 
could indicate a difference in assembly. We have found 
recently that BoTV virions contain a third membrane 
protein, the 65-kDa hemagglutinin-esterase (145). 

The structural proteins of arteriviruses are unrelated 
to those of the Coronaviridae. There is a basic set of 
three envelope proteins (21-24). (i) a 16- to 20-kDa 
nonglycosylated membrane protein (M) which tra- 
verses the membrane three times and thus structurally 
resembles the M protein of corona- and toroviruses, (ii) 
a heterogeneously N-glycosylated triple-spanning pro- 
tein (designated G, for EAV) of variable size, and (iii) a 
class I glycoprotein of 25-30 kDa (designated Gg for 
EAV) which is a minor virion component. The G; and 
M proteins associate into disulphide-linked het- 
erodimers and probably form the cup-like structures 
on the virion surface (24-26). 


GENES AND REGULATORY ELEMENTS 
Overall Genome Structure 
Nidoviral genome RNA is single-stranded, infectious, 


polyadenylated (27-29), and, at least for arteri- and 
coronaviruses, 5’ capped (30,31). Nucleotide sequences 


Copyright © 1997 by Academic Press 


Genome Organization of the Nidovirales 35 


' | 


ARTERIVIRUS ORFia oe a re 
TOROVIRUS spies ns Shee : Le me 
ORFia ORF1b 30K HE s EM N 
CORONAVIRUS An 
A 
——— A, 
Ar 
—— A, 
——A, 
A, 
_—— An 
b c 
Arteri HCV Rr va w|iafo[r L alalssoVLVr/ViLWAGGLEILITTNRY FIVIKIGAY - 1720 
Teev «|RITVOMVCIDIYFDIGILSDILI|FIVILIWAGGLE|LITTMRYFIVIKIGRP _ - 1722 
MHV RITVQML SIAL VDLADSVVLVTWAASJFELLITCLLIRYFAKVGRE - 1723 
IBV «OIRIIVOMLAIDINLCNVSDCVV/FIVTWCHGLE|L|TTILIRYFIVIKIGKE  - 1729 
EW s ACe eer a MEPse wal ceeatlepe - 1530 
Toro 
HY = KK HICIO.. TVIAITICIYNSVSNODIVICIC}FKHAL|G].. CDYVYINIPYVI - 1757 
TGEV = QKICIE . K slalticly SSS QSVIY|AIC/JFKHAL|G]. . CD Y|LIYINIPYCI  - 1759 
MHV =v VIC SIV TK RiAlT|clF N s[xfr{a]y|yialclwR HS YS. CD Y|LYINPLIV - 1761 
IBY = QV / SRIAITTFNSHTQAIVAICINKHCLIG)..FDFVY|N/PLLV - 1766 
Corona j = 
Z EV OY F : emalr{c]t . w[Rjofa]u}y|K|c|R wc y @l@lML 1S K[L]vIN|c K YL - 1568 
RdRp Zf#H (4) (23) > 


HCV DITJQ|QWGYVGS/L/S T NIHJH . N VIHIR N EJHJVJAIS GID AJIJM T/RIC LIA = - 1796 
TGEV = {DIT/Q}QWGY TG S/L/S M NIHJH . NTJHIR N EJHIVIAIS GID AJTJM T/RIC LIA - 1798 
MHV DITIQIQWGYTGS/L]TS NIHD . S VIHIK GAJHIVIATS S}D A}TJM T/R]C LJA = - 1800 
IBV DIT]QQWGYSGN F NJH DJL HIC]. N V/H GIH AJHJVJAIS VID AJIJM TIRIC LIA = - 1805 
ETV DIVJQKK. . ERVK DAIHDAICjQQ FIH GID SJHIEJAIL C/D AJVJM TIKiC LY = - 1600 


FIG. 1. (a) Scale representation of archetypical Nidovirales genomes. The torovirus genome organization is based on combined data for ETV 
and BoTV, and that of coronavirus is typical for a group I member (Table 1). The 5’ ends of ORF1b have been aligned. The bottom panel illustrates 
the 3’ coterminal nested set of mRNAs produced during coronavirus infection. ORFs are represented by boxes. The coding assignments are also 
indicated. Hatched boxes represent the ORFs for HE and the 30-kDa ns2a protein of group I coronaviruses and related sequences in the torovirus 
genome. The arrow indicates the position of the pseudoknot structures required for translational read-through of ORF1b. The 5’ leader 
sequences are depicted by a small black box. Poly(A) tails are indicated by Ap. (b) Sequence conservation in the POL1b polyproteins. Conserved 
domains are indicated by hatching. RdRp, Zf, and H indicate the RNA-dependent RNA polymerase, zinc finger, and helicase motifs, 
respectively. Domains 1-3 indicate conserved regions for which as yet no function has been suggested. Motif 2 corresponds to the previously 
described CVL domain. Motif 1 has not been described before. Bracketed lines indicate (predicted) proteolytic cleavage sites (for details see text). 
(c) Sequence conservation in motif 1. Sequences were taken from (32,34,36,41,144). Residues conserved between corona- and toroviruses are 
boxed. 
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are known for the complete RNA of coronaviruses 
MHYV, IBV, TGEV, and HCV 229E and arteriviruses 
EAV, LDV, and PRRSV (32-39) and for parts of RNA of 
several other Nidovirales, including ETV strain Berne 
(40,41) and SHFV (Godeny et al., in press). The size of 
the arterivirus genome is from 13 to 15 kb. The 
genomes of toroviruses and coronaviruses are consid- 
erably larger (up to 31 kb) and include the largest 
known RNA genomes. Despite the differences in ge- 
netic complexity and gene composition, the genome 
organizations of arteri-, toro-, and coronaviruses are 
remarkably similar. More than two-thirds of each 
genome are taken up by two huge overlapping open 
reading frames (ORFs), designated ORF1la and 1b. The 
more downstream, ORF1b, is only expressed after 
translational read-through via a -1 frameshift medi- 
ated by a pseudoknot structure (42). The polypeptides 
encoded by these ORFs are proteolytically cleaved by 
virus-encoded proteinases to yield the proteins in- 
volved in viral RNA synthesis. 

Downstream of ORF1b, there are four to nine genes 
that encode the structural proteins and, at least for 
coronaviruses, a number of nonstructural proteins. 
These genes are expressed from a 3’ coterminal nested 
set of sg mRNAs (8,40,43,44). Although these mRNAs 
are structurally polycistronic, translation is restricted 
to the unique 5’ sequences not present in the next 
smaller RNA of the set. Cells infected by arteriviruses 
or coronaviruses contain negative-stranded RNAs 
which correspond to each mRNA and which may 
serve as templates for transcription (45-49). 


Sequence Elements Regulating Transcription 


Each transcription unit (comprising one or more 
genes expressed from a single mRNA species) is 
preceded by a short consensus sequence, the comple- 
ment of which is thought to function as a promoter: the 
transcription-associated sequence (TAS) (3,10,50). The 
relative strength of coronavirus promoters is influ- 
enced by the primary structure of the TAS (10,50,51) 
and the presence downstream of other TASs. In gen- 
eral, downstream TASs have a negative effect on 
transcription levels from upstream sites (52-54). For 
MHV, host proteins of 35 and 38 kDa have been 
identified that specifically bind to the TAS and may 
serve as transcription factors (9,55,56). 

The sg mRNAs of corona- and arteriviruses carry a 
5’ leader sequence of 55-92 and about 200 nt, respec- 
tively, which are derived from the 5’ ends of the viral 
genomes. The mRNA synthesis thus requires, at least 


at one point, a discontinuous transcription event (43,44). 
The fusion of “leader” and “body” sequences occurs 
within or in close proximity to the TAS (10,49,57,58). 
Puzzlingly, the torovirus mRNAs seem to lack an 
extensive 5’ leader sequence (40,59). Thus if the use of 
a leader sequence evolved before the divergence of the 
Nidovirales, toroviruses must have lost their leader 
relatively recently. The close evolutionary relationship 
between toro- and coronaviruses suggests that this 
event took place after the Coronaviridae and Arteriviri- 
dae diverged. Alternatively, the common ancestor of 
the Nidovirales may have used a leader-independent 
transcription mechanism and arteri- and coronavi- 
ruses acquired a 5’ leader independently. In either 
view, the addition of noncontiguous leader sequences 
would not be a mechanistically important aspect of 
mRNA synthesis (as suggested by the “leader-primed” 
transcription model) (8) but rather a modification of a 
common transcription scheme, based primarily on 
transcriptase-promoter recognition (9,60). What then 
is the function of the leader sequence? Perhaps the 
discontinuous transcription seen in arteri- and corona- 
viruses has evolved merely to provide each viral 
mRNA with a translational enhancer, allowing efficient 
competition with host mRNAs for the cellular transla- 
tional machinery. Indeed, there is evidence that the 
coronavirus leader sequence stimulates viral transla- 
tion in cis, possibly in conjunction with a virus- 
specified or virus-induced factor (61). 

For a complete understanding of Nidovirales tran- 
scription-initiation, studies on torovirus mRNA synthe- 
sis will be pivotal. In fact, the existence of a small 
torovirus leader RNA cannot entirely be excluded. 
Sequence analysis of ETV defective interfering RNAs, 
combined with results of primer extension studies, 
suggest that a TAS is present at the extreme 5’ end of 
the viral genome which could give rise to a leader of 
approximately 8 nt (59). 


5’ and 3’ Nontranslated Regions 


The promoters required for genome replication are 
commonly found at the 5’ and 3’ ends of the genome. 
Coronaviruses have nontranslated regions (NTRs) rang- 
ing from 0.2 to 0.5 kb (5’) and from 0.3 to 0.5 kb (3’). 
Their primary structure is poorly conserved among the 
different subgroups. Deletion mapping studies using 
synthetic DI RNAs suggest that for the group II 
coronaviruses, about 0.5 kb of each end of the genome 
is required for replication, implying that promoter 
elements may extend into ORFla and the N gene 
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(10,62-66). All coronavirus genome RNAs have the 
sequence 5' U/GGGAAGAGC 3’ about 70 nt upstream 
of the poly(A) tail (67,68). The strict conservation of 
this sequence element suggests that it has a role in 
replication. Surprisingly, however, the 3’ most 55 nt of 
the 3’ NTR of MHV appear to be sufficient to drive 
minus-strand synthesis (69). 

The 3’ NTRs of toroviruses are about 0.3 kb. The 5’ 
NTR of ETV strain Berne is 0.8 kb (59) but the lengths 
of 5’ NTRs of other toroviruses are unknown. The 5’ 
NTRs of arteriviruses are about 0.2 kb and, unlike 
those of coronaviruses, consist almost entirely of the 
leader (37-39,70). The 3’ NTRs of arteriviruses are also 
short, ranging from 59 to 151 nt, and conserved 
sequence elements have not been found. 


POLYPROTEIN PROCESSING: 
THE POLYMERASE GENE 


The overlapping ORFs 1a and b found at the 5’ end 
of the nidoviral genome are frequently referred to as 
the “polymerase gene.” However, there is little doubt 
that the processing of the encoded polyproteins yields 
proteins required for RNA synthesis as well as a 
number of products involved in other aspects of virus 
replication. The 1a and 1b polyproteins of coronavi- 
ruses are 3951 to 4492 and 2682 to 2714 residues long, 
respectively. POL1b of ETV strain Berne consists of 
2289 residues; only limited sequence data are available 
for torovirus ORF1la. The polyproteins of arteriviruses 
are much smaller, with lengths of 1727-2396 (POL1a) 
and 1411-1459 (POL1b) residues. 

Amino acid sequence comparisons show that the 1b 
polyproteins of corona-, toro-, and arteriviruses are 
basically colinear (37,41) (Fig. 1b). The sequence conser- 
vation between the more closely related corona- and 
toroviruses is clustered in six domains, four of which 
are also found in the arterivirus POL1b: the “classical” 
RNA-dependent RNA polymerase (RdRp) and heli- 
case (H) domains, which are also present in the 
polymerases of most other viruses, a zinc finger motif 
(zf), and a short region of 80-100 residues, which has 
not yet been identified in other viral polymerases and 
was called the “coronavirus-like” (CVL) domain (3) 
(motif 2 in Fig. 1b). 


Processing of Coronavirus POL1A Polyproteins 
by Papain-like Proteinases 


There is little sequence conservation among the 
N-termini of the POLla polyproteins of the three 


coronavirus subgroups. Size differences can mostly be 
attributed to these regions (Fig. 2) and sequence 
similarities are limited to papain-like cysteine protein- 
ase (pcp) domains (33,34,36,71). POL1a of HCV 229E, 
TGEV, FIPV (subgroup I), and MHV (subgroup II) 
have two pcp domains, whereas that of IBV (subgroup 
III) contains a single pcp domain. These pcp seem to be 
involved in the processing of the N-termini of the la 
polyproteins. 

The proteolytic cleavage of the N-terminus of the 
coronavirus 1a polyprotein has been studied in most 
detail for MHV. In vitro translation of genomic RNA 
gave products of 28 and 220 kDa and the production of 
p28 was sensitive to proteinase inhibitors, suggesting 
that it arose by a proteolytic cleavage(72). p28 was also 
detected in MHV-infected cells (73). Partial peptide 
mapping revealed that p28 is derived from the N-termi- 
nus of POL1a (74). Baker et al. (75) subsequently showed 
that the proteolytic activity responsible for the produc- 
tion of p28 mapped to residues 1223-1695 of POL1a 
which contains the N-terminal-most pcp domain (pcp1) 
(33). Mutagenesis showed that any change of either 
Cys!'8” or His!?88 (Cys!!! and His?” of MHV-A59) 
(35,76) resulted in the loss of proteinase activity, 
suggesting that these residues form the catalytic dyad 
(77). Cleavage to give p28 was at an RGV motif at the 
G”/V48 dipeptide bond (78,79), and presumbably 
occurred in cis (75). Reactions of specific antisera raised 
against different regions of MHV POL1a with potential 
cleavage products with apparent molecular weights of 
65, 50, 240, and 290 kDa in MHV-infected cells (80,81) 
showed that processing of the N-terminus of POL1a 
involves multiple cleavage events. p65 is thought to be 
immediately adjacent to p28 (81,82). Gao et al. (82) 
reported that p65 of MHV strain JHM is generated 
from a p72 precursor, but this precursor has not been 
observed by others studying MHV strain A59 (81). 
Kinetic analysis suggests that p290 is a precursor to 
po0 and p240. A provisional map of the POL1a region 
of MHV is shown in Fig. 2. The proteinases involved in 
the release of p65, p50, and p240 have not yet been 
identified. Although some authors have implicated 
pep! in the cleavage of p65 (76) this is disputed by 
others (82). 

Only limited data are available on the processing of 
the N-terminus of POL1a of IBV. Using monospecific 
antisera raised against residues 49-514 or 247-599, Liu 
et al. (83) detected a 87-kDa product in IBV-infected 
cells. It is not known if IBV p87 represents the N-termi- 
nal cleavage product or if an additional smaller prod- 
uct is released from the N-terminus of POL1a. p87 was 
also found upon in vivo expression of the N-terminal 
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FIG. 2. Proteolytic processing of the coronavirus polyproteins POL1a and POL1ab. Provisional cleavage maps were constructed on the basis of 
the combined data discussed in the text. POL1a and POL1b sequences are indicated by boxes. Papain-like (pcp) and 3C-like cysteine proteinase 
(3clp) domains are indicated by shading, as are the RNA dependent RNA polymerase (RdRP), zinc finger (Zf), and helicase domains (H). Also 
shown are the hydrophobic domains, mp1 and mp2, that flank 3clp. Cleavage sites that have been identified experimentally either by protein 
sequence analysis or by site-directed mutagenesis are indicated by black arrows. White arrows indicate cleavages for which the exact cleavage 
site has not been determined. Cleavage products are designated after their apparent molecular weight as determined by SDS-PAGE. Proteinases 
involved in each cleavage event are given. Question marks indicate cleavages for which the proteinase has not yet been identified. Open 


arrowheads indicate predicted cleavage sites for 3clp. 


1742 residues of IBV POL1a (83), which include the 
pcp domain (33,71). Interestingly, p87 was not detected 
after in vivo expression of a shorter N-terminal polypep- 
tide of 1444 residues that lacked pcp, strongly suggest- 
ing that pcp is involved in the release of this product. 
Because p87 did not appear when the 1742-residue 
polypeptide was produced by in vitro translation, 
cellular factors may also be involved in this cleavage 
event. However, in vivo processing of this polypeptide 
was also inefficient, possibly because the pcp is located 
at the C-terminus of the 1742-residue expression prod- 


uct and sequences downstream of this domain are 
required for optimal proteolytic activity. 

In our laboratory, a monospecific antiserum, raised 
against the N-terminal 198 residues of the 1a polypro- 
tein of FIPV, specifically recognized products of 12, 83, 
and 100 kDa in FIPV-infected cells. These products were 
also found upon in vivo expression of the N-terminal 
1446 residues of FIPV POL1a containing the pcp1 
domain. Kinetic analysis suggested that p12 and p83 
are mature products with p100 as their precursor. p12 
reacted with antiserum raised against the N-terminal 
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15 residues of POL1a, showing it to be the N-terminal- 
most cleavage product. pcp1 appears to be involved in 
the release of both p12 and p83. Substitution of the 
presumptive catalytic cysteine residue of pcp1 (Cys), 
completely abolished proteolytic activity (Fig. 2; De 
Groot et al., in preparation). 


Processing of the Coronavirus Polyproteins 
1a and 1ab by the 3C-like proteinase 


In contrast to the N-termini, the C-terminal third of 
coronavirus POL1a polyproteins are well conserved. 
All contain a proteinase domain flanked by hydropho- 
bic regions, designated mp1 and mp2 (Fig. 2). This 
proteinase is related to the chymotrypsin-like serine 
proteases, but with a cysteine rather than a serine 
residue as the active site nucleophile (33,34,36,71,84). A 
similar situation exists in the 3C proteinases of picorna- 
viruses and 3C-like proteinases of plant viruses (85). 

The 3C-like proteinases (3clp) of coronaviruses are 
involved in the processing of the C-terminus of POL1a 
and of POL1ab. The results obtained for IBV, MHV, 
and HCV 229E differ only in details. The 3clp mediates 
at least four cleavage events. It autocatalytically ex- 
cises itself from the polyprotein precursor, yielding 
products of 35, 27, and 34 kDa for IBV, MHV, and HCV 
229E, respectively (86-89) (Fig. 2). The release of IBV 
3clp (but not that of MHV) from a synthetic precursor 
in vitro was dependent on the presence of microsomal 
membranes and apparently required membrane- 
association of the flanking lipophilic domains (86,87). 
Lu et al. (88) proposed that because production of the 
MHV p27 in vitro was sensitive to dilution, the autocata- 
lytic release of 3clp occurs mainly in trans. Protein 
sequence analysis identified Q¥°°/5°4 and Q79/ A? 
as the respective N-terminal cleavage sites of MHV 
p27 and HCV p34 with the Gln residues in the P1 
position (87,89). p35 of IBV is generated by cleavage of 
QS dipeptides at positions 2779-2780 and 3086-3087 
(86). The cleavage sites flanking 3clp are well con- 
served among the different coronaviruses. 

Processing of the POL1ab polyprotein by 3clp also 
resulted in the production of a polypeptide of c. 100 
kDa, containing the RdRp domain (90-92). The cleav- 
age sites for IBV and HCV 229E were at the position- 
ally conserved dipeptides Q°78/59? and Q*°/S or 
(4068 /$406 and Q/A, respectively, the N-terminal 
most of which are located in POL1a (Fig. 2). Processing 
leading to the release of the RdRp can occur in trans, 
both in vitro and in vivo (91,92). 

Gorbalenya et al. (71) predicted that the catalytic site 


of the IBV 3clp consists of a triad formed by His”®”°, 
Glu’, and Cys”. The Cys and His residues are 
conserved in the 3clp domain of the other coronavi- 
ruses and their involvement in proteolysis has been 
confirmed by site-directed mutagenesis (86,87,89,91). 
Glu** is not part of the catalytic site. This residue is 
not conserved in other 3clp and substitution by Asn, 
Asp, or Gln did not affect proteolytic activity (91). In 
agreement with the assumed evolutionary relationship 
with cellular trypsin-like serine proteases, the corona- 
virus 3clp are sensitive to both serine and cysteine 
protease inhibitors (86,88). Moreover, substitution of 
the active site Cys by Ser yielded an IBV 3clp which 
was still partially active (86). 

The cleavage sites of the coronavirus 3clp conform to 
the consensus XQZ, with X being a hydrophobic 
residue (L, V, I, M or F) and Z a small uncharged 
residue (S, A, G or C). These data provide experimental 
support to earlier predictions (33,71). Alignment of 
POL1ab sequences suggests that 3clp may cleave at 
seven additional conserved sites (Fig. 2). Cleavage at 
the sites in MHV POL1a would produce four extra 
polypeptides with predicted molecular weights of 33, 
10, 34, and 15 kDa. The 33-kDa product would contain 
the hydrophobic domain mp2, whereas the 15-kDa 
product would be a cysteine-rich polypeptide resem- 
bling murine epidermal growth factor in sequence 
(71). Processing of POL1b would yield the RdRp and 
four other products. The zinc finger and helicase 
motifs would be in a product of about 67 kDa and the 
conserved motif 1 would be in a polypeptide of 59 
kDa, whereas motifs 2 (the CVL domain) and 3 would 
be in products of 42 and 33 kDa, respectively (Figs. 1 
and 2). The latter may correspond to a 33-kDa protein 
in lysates of MHV-infected cells which reacted with 
antiserum against the 14 C-terminal amino acids of 
POL1b (93). 


Processing of the Arterivirus Polymerase 
Polyproteins 


Most of what is known about arterivirus polyprotein 
processing stems from the work of Snijder and col- 
leagues on EAV; only limited information is available 
for PRRSV and LDV. As for coronaviruses, most 
sequence variation occurs in POL1a. Processing of the 
N-terminus of POLla is mediated by papain-like 
cysteine proteinases, whereas the C-terminus of POL1a 
and the conserved 1b polyprotein is probably pro- 
cessed by a 3C-like proteinase which is located at the 
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C-terminus of POL1a and flanked by hydrophobic 
domains (Fig. 3). 

For both PRRSV and LDV (38,39), the N-terminus of 
POL1a contains two papain-like proteinase domains, 
pcpa and pcp, which mediate their own release by 
cleavage in cis at C-terminal cleavage sites, giving rise 
to products nsPle and nsP 1 (Fig. 3) (94). The PRRSV 
and LDV leader proteinases share 48% sequence iden- 
tity. For PRRSV, Cys” and His™* are crucial for pcpa 
activity (94), whereas cleavage by pcpB was dependent 
on Cys?’6 and His*°. For LDV, Cys’”° and Cys” were 
identified as active site cysteines. The cleavage sites in 
POL1a have not been mapped but from the sizes of 
nsPla and nsP18, and from the results of deletion 
analyses, are predicted to be around position 170 for 
pepe and between Tyr**+ and Gly for PRRSV pep, 
and between Tyr°*° and Gly**! for LDV pep. 

EAV is thought to have a single leader proteinase 
(37), corresponding to pecpB of LDV and PRRSV. 


PRRSV “Fe 
pepo. pcp cp 


p22 p26 

neP hy 

LDV “ 
pepa pcp cp 


Ma 


EAV 
pep cp 


p22 ad p7 p41 


However, relicts of nsPla are still present in the 
N-terminus of EAV POL1a (94). The EAV pcpB homo- 
logue releases a 29-kDa protein, nsP1° (Fig. 3), appar- 
ently exclusively by cleavage in cis at G*°/G*!. The 
results of site-directed mutagenesis suggested that 
Cys! and His*? form the catalytic dyad (95). 

Four additional mature cleavage products were 
identified in lysates of EAV-infected cells (96) and were 
designated nsP2 to 5 (Fig. 3). The 61-kDa nsP2 protein 
is released by cleavage between Gly**! and Gly*” and 
the catalytic activity responsible is within the N-termi- 
nal 165 residues of nsP2 as this domain can induce 
cleavage at the 2/3 site in trans (97). Sequence compari- 
sons suggested that the catalytic residues in the cyste- 
ine proteinase domain were Cys*” and His**. Substitu- 
tions of these residues completely abolished proteolytic 
activity, but so did replacement of three other con- 
served cysteine residues (positions 319, 349, and 354). 
The N- and C-terminal sequences of nsP2 are highly 


RdRp Zf+H 


RdRp Zf+H 


sp 
2 


RdRp  _Zf+H 


FIG. 3. Proteolytic processing of the arterivirus polyproteins POL1a and POL1ab. The (provisional) cleavage maps were constructed on the 
basis of the combined data discussed in the text. POL1a and POL1b sequences are indicated by boxes. POL1a cleavage products are numbered 
according to Snijder et al. (96). Also shown are the apparent molecular weights of the cleavage products. The papain-like proteinase domains 
(pcp) and the nsP2 cysteine (cp) and the nsP4 serine proteinases (sp) are indicated by shading as are the RNA-dependent RNA polymerase 
(RdRp), zinc finger (Zf), and helicase domains (H). Also shown are the hydrophobic domains, mp1 and mp2, that flank nsP4. Cleavage sits that 
have been identified experimentally are indicated by black arrows. White arrows indicate cleavages for which the exact cleavage site has not yet 
been determined. Cleavages performed by the serine proteinase are given. Arched arrows depict cleavages performed by the leader proteinases. 
Open arrowheads indicate predicted sp cleavage sites, black arrowheads mark cleavages possibly performed by a cellular proteinase. 
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conserved among EAV, LDV, and PRRSV. In contrast, 
the middle portions differ markedly in size (210-670 
residues) and sequence (37-39) (Fig. 3), suggesting that 
nsP2 has species-specific rather than genus-specific 
functions (94). Multiple sequence alignments suggest 
that the nsP2/nsP3 cleavage sites for LDV and PRRSV 
are Gly-Gly at positions 1286/1287 and 1462/1463, 
respectively. 

Inhibition of cleavage at the nsP2/3 junction abol- 
ishes downstream proteolytic events, which are prob- 
ably all mediated by a 3C-like serine protease (sp) (98) 
located within nsP4. Site-directed mutagenesis results 
suggest that the catalytic triad of the nsP4 protease 
comprises His", Asp", and Ser™#4, while Thr"? and 
His"*8 may be involved in substrate recognition. Snijder 
et al. (98) further identified three cleavage sites within 
POLI1a (E1064 / G1065° E1268 /51269 and E1677 / G1678) and two 
additional cleavage sites were proposed in the C-termi- 
nus of POL1a (99). The corresponding cleavage sites in 
LDV and PRRSV in Fig. 3 are inferred. 

Three putative recognition sequences for the nsP4 
protease were predicted in POL1b. Proteolytic cleav- 
age at these sites would separate the RdRp motif from 
the putative metal binding and helicase domains. 
Reaction with specific antisera detected four possible 
cleavage products designated p80, p50, p26, and p12, 
respectively (Fig. 3), and a number of putative precur- 
sor proteins in lysates of EAV-infected cells (99). The 
most N-terminal cleavage product, p80, contains the 
RdRp domain, and the putative zinc finger and heli- 
case motifs are in the adjacent p50. The CVL domain 
(motif 3; Fig. 1b) is in p26. 


Nidovirales Polyprotein Processing: 
Differences and Common Concepts 


No information is available on the processing of 
POL1b of toroviruses, although the sequence contains 
a number of potential 3clp- cleavage sites. Because the 
POL1b sequences of toro- and coronaviruses are colin- 
ear (Fig. 1b), the processing of torovirus POL1b is 
likely to be very similar to that of coronaviruses. There 
are some marked differences between Coronaviridae 
and Arteriviridae. The latter lack a cleavage product 
containing motif 1 (Figs 1b and 1c). Moreover, it 
remains to be seen whether the C-terminal POL1b 
cleavage products of the Arteri- and Coronaviridae are 
functionally equivalent. 

For the arteri- and coronaviruses, POL1b processing 
would yield a product containing both the helicase 
domain and the zinc finger motif. Such a combination 


is rare, but not unprecedented as it has also been seen 
in glh-1, a putative RNA helicase from Caenorhabditis 
elegans (100), and the (putative) yeast RNA helicases 
Yer176W (101) and NAM7 (102,103). Most helicases 
lack zinc finger motifs, and it is therefore unlikely that 
the zinc fingers are required for helicase activity (100). 
Perhaps, they may confer sequence specificity, for 
example, in promoter recognition. 


GENES EXPRESSED FROM 
SUBGENOMIC mRNAs 


ORFs and Coding Assignments 


The arteriviruses PRRSV, LDV, and EAV each pos- 
sess six genes, numbered 2-7 from the 5’ end, that are 
expressed from subgenomic mRNAs (37-39,44). These 
ORFs usually overlap (Fig. 1a). ORFs 2, 5, 6, and 7 are 
conserved among all arteriviruses and, using EAV 
terminology, code for Gs, G,, M, and N, respectively 
(21,22,24,104,105). Sequence similarity can be detected 
only at the amino acid level; the conservation is 
generally low and, especially in the EAV proteins, 
restricted to short domains. ORFs 3 and 4 are con- 
served among PRRSV, LDV, and SHFV and code for 
membrane glycoproteins, which in the case of PRRSV, 
are present in purified virions (106,107). The ORF4 
product of EAV shares no obvious sequence similarity 
with that of the other arteriviruses and has not been 
detected in virus preparations. Surprisingly, SHFV 
possesses three additional ORFs. From the limited 
sequence similarities and the apparent positional con- 
servation of cysteine residues it appears that these 
ORFs have arisen from a heterologous RNA recombina- 
tion event by which ORFs 2-4 were duplicated (E. 
Godeny, personal communication). 

Toroviruses apparently express only four genes from 
subgenomic mRNAs, all of which encode structural 
proteins. ETV and BoTV are genetically and serologi- 
cally closely related and share 84% sequence identity 
in the 3’-most 3 kb of their genomes (145). PoTV is 
more distant as judged from the sequence of its 
nucleocapsid protein, which is only 68% identical to 
those of the other two viruses (Kroneman et al., 
unpublished). Snijder et al. (108) noted the presence of 
a small ORF completely contained within the N gene 
of ETV. This ORF, which would encode a hydrophobic 
polypeptide of approximately 10 kDa, is conserved in 
BoTV but abrogated by a termination codon in PoTV. 

Coronaviruses possess up to nine ORFs that are 
expressed from sg mRNAs. Of these, the genes for only 
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the main structural proteins are conserved among the 
three subgroups (sequence identities of approximately 
30%) as is their relative position in the genome (5’ 
S-E-M-N 3’). Apparently, as coronaviruses diverged, 
subgroup-specific sets of accessory genes were ac- 
quired (5,7,109). For instance, the HE gene and ORF2a, 
which encodes a cytoplasmic nonstructural phospho- 
protein of about 30 kDa (16,110,111) (Fig. 1), are only 
found in group II viruses. Differences in gene composi- 
tion occur even among viruses of the same subgroup. 
In CCV and FCoV, ORFs 7a and 7b are at the 3’ end of 
the genome (112,113), but TGEV, which is serologically 
and genetically very closely related to CCV and FCoV, 
lacks 7b (67). HCV 229E lacks both ORFs (68). 

All accessory genes tested thus far are dispensible 
for replication in vitro and in vivo (16,114-119). The 
functions of the encoded proteins are poorly under- 
stood, but at least some may be involved in virus—host 
interactions and thus contribute to viral fitness. For 
example, the 7b gene of FCoV codes for a nonstruc- 
tural 26-kDa secretory glycoprotein (120). FCoV vari- 
ants that lack ORF7b readily arise in tissue culture, but 
among naturally occurring FCoV strains, the gene is 
strictly maintained and its loss correlates with reduced 
virulence (118). 

In contrast to the other Nidovirales, a number of 
coronaviruses have polycistronic mRNAs which con- 
tain up to three ORFs clustered in a single transcription 
unit. Downstream ORFs are usually translated by 
leaky scanning but the synthesis of the E proteins of 
IBV (ORF 3c) and MHV (ORF 5b) may involve internal 
intiation of translation mediated by a ribosomal land- 
ing pad (5,121-123). The N gene of some group II 
coronaviruses contains a small internal ORF in the +1 
reading frame (Fig. 1) that is expressed in infected cells 
(24,125). It encodes a hitherto unrecognized structural 
protein that is not essential for virus replication in vitro 
and in vivo (119). 


RNA Recombination: A Driving Force 
in Nidovirales Evolution 


The variation in coronavirus gene composition is 
probably the result of heterologous RNA recombina- 
tion events during which gene modules (126) were 
obtained either from nonrelated viruses or from the 
host. The most compelling example is the HE gene, the 
product of which is 30% identical to the N-terminal 
subunit of the hemagglutinin-esterase fusion protein 
(HEF) of influenza C virus (ICV) (16). Heterologous 
RNA recombination events must also have taken place 


during torovirus evolution. A 0.5-kb remnant of an HE 
gene was found in the ETV genome (20) and an intact, 
functional HE gene of 1.2 kb is present in the genome 
of BoTV (Fig. 1; 145). The torovirus HE protein shares 
30% sequence identity with both the influenza C virus 
HEF and the coronavirus HE. In addition, sequences 
related to ORF2a of group II coronaviruses were found 
at the 3’ end of ETV ORF 1a (20) (Fig. 1). The HE and 
the ORF2a-related sequences found in corona- and 
toroviruses were probably not inherited from a com- 
mon ancestor, but acquired through separate heterolo- 
gous RNA recombination events (6,20) because (i) the 
genes are in different positions in the two virus 
genomes (Fig. 1) and (ii) it is highly unlikely that genes 
retained during the considerable evolutionary diver- 
gence between corona- and toroviruses would have 
been lost from the genomes of coronavirus subgroups I 
and III. 

The differences among the main structural proteins 
of the Nidovirales could also be explained by heterolo- 
gous recombination (3). A switch from an arterivirus- 
like isometric nucleocapsid structure to the extended 
helical nucleocapsid structures of the Coronaviridae 
may have been a determining step in the divergence of 
the Nidovirales (38). Removal of constraints on ge- 
nome size would have allowed toro- and coronavirus 
ancestors to acquire large genomes and thus develop 
the variation in gene composition seen today. A rela- 
tively recent replacement of the N gene may subse- 
quently have led to the divergence of the toro- and 
coronaviruses. 

Homologous RNA recombination (128,129) may also 
be an important force in Nidovirales evolution. High 
frequency recombination of coronavirus genomes has 
been observed in tissue culture (130,131), in experimen- 
tally infected animals (132) and in embryonated eggs 
(133). Homologous recombination allows the rapid 
exchange of beneficial mutations and also serves as a 
correction mechanism counteracting Muller’s ratchet 
(134). There is evidence that homologous recombina- 
tion occurs in IBV genomes in the field (135,136,146) 
and a genetic exchange between CCV and FCoV 
serotype I strains may have resulted in the emergence 
of anew FCoV serotype (118,137,138). 


CONCLUDING REMARKS AND FUTURE 
PERSPECTIVES 


The nidoviral replicase module has given rise to 
viruses that utilize similar replication strategies and 
yet differ markedly in genetic complexity. Common to 
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the Nidovirales is the use of a nested set of mRNAs. 
This property, often regarded as “unique,” is shared 
with the phylogenetically unrelated closteroviruses, a 
genus of filamentous RNA viruses of plants (139,140). 
Closteroviruses have genomes of up to 20 kb in length, 
thus approaching the Coronaviridae in genetic com- 
plexity. They also resemble the Nidovirales in genome 
organization and expression, including the use of large 
polymerase polyproteins, encoded by two overlapping 
ORFs located at the 5’ end of the genome, and 
down-regulation of RdRp synthesis by ribosomal 
frameshifting. These recent findings underscore the 
power of convergent evolution and indicate that simi- 
larities in genomic organization and common mecha- 
nisms of gene expression and regulation are not 
reliable taxonomic criteria by themselves. Even the 
results of comparative sequence analysis should be 
regarded with caution. Alignments of RdRp domains 
have been presented to illustrate the evolutionary 
relationship between the Nidovirales, but the phyloge- 
netic signal in this domain is not sufficient to support a 
common ancestry of corona- and arteriviruses (141). 
Here, the toroviruses provide the “missing link” and 
thus justify a phylogenetic grouping of corona-, toro-, 
and arteriviruses (141) (P. Zanotto, personal communi- 
cation). 

The analyses of Nidovirales genomes and the stud- 
ies on polyprotein processing have led to the identi- 
fication of many viral proteins, some of which are 
conserved and some of which are genus- or even 
species-specific. The next formidable task will be to 
determine the function of each of these products. What 
is the added value of the nonconserved POL1la- 
derived cleavage products? Are they antagonists of the 
intracellular antiviral response or involved in host shut 
off? What are the functions of the proteins derived 
from POL1b? Why are proteins containing motifs 1 
and 3 lacking in arteriviruses and what are the conse- 
quences for replication and transcription? Are replica- 
tion and transcription distinct processes? Is there a 
developmental shift from replication to transcription 
and if so, how is this regulated? What is the function of 
the various accessory genes of coronaviruses and how 
do they contribute to viral fitness? Many of these 
questions may well be solved in the near future. Both 
in Leiden (147) and in Utrecht (Glaser et al., in prepara- 
tion), full-length cDNA clones of the EAV genome 
have been constructed, from which infectious tran- 
scripts can be derived. For coronaviruses, no such 
clones are yet available. However, homologous RNA 
recombination can be exploited to introduce site- 
specific mutations into the viral genome using syn- 


thetic (DI) RNAs as donor sequence (65,119,142,143). 
Targeted RNA recombination provides an attractive 
strategy to characterize the various ts-mutants of MHV 
(65). Undoubtedly, the recent development of methods 
to study arteri- and coronaviruses by reverse genetics 
heralds a new era in Nidovirales research. 
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